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We study the evolution of mutation rates for an asexual population living on a static fitness 
landscape, consisting of multiple peaks forming an evolutionary staircase. The optimal mutation 
rate is found by maximizing the diffusion towards higher fitness. Surprisingly the optimal genomic 
copying fidelity is given by Q op t = e~^" (where v is the genome length), independent of all other 
parameters in the model. Simulations confirm this theoretical result. We also discuss the relation 
between the optimal mutation rate on static and dynamic fitness landscapes. 



Evolution on the molecular level can be viewed as a 
diffusion process. The equations describing the time dy- 
namics of a population of gene sequences are a set of 
discrete diffusion equations with an exponential growth 
term. The diffusion stems from inaccurate copying of the 
genome during replication. This enables the population 
to explore the sequence space, i.e., the space spanned by 
all possible gene sequences. Point mutations makes the 
Hamming distance a natural metric on sequence space, 
which becomes topologically equivalent to a hyper-cube 
of dimension v, where v is the genome length. The high 
dimensionality makes analysis of the general diffusion 
process difficult. In this paper we focus on the evolution 
through a specified path in the hypcrcube and disregard 
the dynamics of all other gene sequences. This gives a 
one dimensional sequence space. We are interested in the 
optimal mutation rate, which is defined as the mutation 
rate that maximizes the diffusion speed. 

The genome codes mainly for proteins which regulate 
the chemical reactions within the cell. One of the pro- 
cesses that are under genomic control is the replication 
of the genome itself. When the genetic material is copied 
there are replicase enzymes involved. This is important 
since an unguided base pairing process is highly inaccu- 
rate. The enzymes are determined by the genome and 
the mutation rate of the organism is therefore under evo- 
lutionary control. This implies that the mutation rates 
observed in living organisms have been selected for by 
Darwinian evolution. 

Naively one may think that since most mutations that 
affect the fitness are deleterious, organisms should evolve 
as low mutation rates as possible. Measurments of mu- 
tation rates however show that organisms have copying 
fidelities much below what could be expected from this 
assumption JTjJ^] . They also show that the genomic muta- 
tion rate, i.e., the probability of one or more mutations to 
occur during one replication of the whole genome, is ap- 
proximantly constant within similar groups of organisms. 
This is surprising since the copying of the genetic mate- 
rial is a local process and it is the mutation rate per base 
pair that are directly affected by the replicase enzymes. 
Most attempts to find an evolutionary explanation for 



the observed mutation rates have been based on popula- 
tions evolving in a changing environment, see e.g., [p|-^o|. 
It is easy to understand that a non-zero mutation rate is 
selected for on a dynamic fitness landscape, since per- 
fect copying will unable adaption to new conditions. Re- 
cently a theoretical study has shown that the optimal 
genomic copying fidelity in a dynamic environment is 
approximately independent of genome length p0| . The 
theory also predicts mutation rates of the same order of 
magnitude as observed for simple DNA based organisms. 
In this paper we study a different model. The popula- 
tion lives in a static environment, but starts far from the 
global fitness maximum. A non-zero mutation rate is se- 
lected for by maximizing the rate of evolution towards 
better fit genotypes. 

Consider an asexual haploid population of individuals, 
represented by genomes of length v. The fitness land- 
scape consists of a number of peaks with superior fitness 
surrounded by a background. The evolution on this land- 
scape is driven by mutations enabling jumps from one 
fitness peak to a higher peak in the close neighborhood. 
We study a population of N gene sequences starting at a 
low fitness peak which then mutate onto successive fitness 
peaks of increasing height (a\ < a 2 < ■ ■ ■). Furthermore 
we assume the copying fidelity per base, q, to be constant 
over the genome. The probability of a gene sequence to 
copy onto itself during one replication event, the genomic 
copying fidelity, is then given by Q — q v . We also assume 
the probability of an individual on peak to produce 
an offspring on peak <Xj during a replication event to be 
Pi(l — q) ai q v ~ ai . This means that the number of bases 
where the sequences defining peak <Ji-\ and <Ji differ is 
oti. The factor pi is an arbitrary combinatorial factor, 
accounting for possible redundancies in sequence space, 
alphabet size, etc. All higher fitness peaks, <Jk for k > i, 
are assumed to be further away so that mutations from 
peak Ui-i can be neglected. The evolution of the relative 
concentrations x n is described by differential equation 

Xl = Wi. 1 9 N (x 1 ) + Wl,20 N (X2) - fxi 

x 2 = W 2 ,i9 N (x 1 ) + W 2 , 2 0n(x2) + ^2,3^(2:3) - fX2 
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in = W nt „-l6 N (x n -i) + W n>n d N (x n ) + 

W n>n+ id N {x n +i) - fx n 
where the function 9^ is defined as Ojsr(x n ) 



(1) 



if x > 



N 



and otherwise, and therefore accounts 



for the limited population size. The factor / = 
Si ( < f +Pi(l — <l) ai ) Ci^N^i) ensures Xi to be normal- 
ized as relative concentrations. The matrix elements of 
W are given by 



W„_ 1 , n =p„(l-g) a "^ 



W n , n+1 = p n+1 (l - q) a ^q v - a ^a n+1 



(2) 



We start with a population that consists of individuals 
on the first peak cti, i.e., we define the initial values as 



Xi(0) = 



1 i = 1 
i± 1 



(3) 




FIG. 1. The time dynamics of Eq. |l] is simulated numeri- 
cally. When the population diffuses off the initial peak o\ it 
starts evolving to peaks with higher and higher fitness. The 
parameters used in this plot are v = 100, en = i, p = 0.01, 
Q = 0.99 and N = 10 6 . 

The infinite population size limit of Eq. ^ corresponds 
to a discrete normalized one-dimensional diffusion equa- 
tion with an exponential growth term. However, this 
limit is not interesting for realistic systems since it does 
not allow propagating distributions of concentrations lo- 
calized in sequence space. If the fitness grows faster than 
linearly for example, the concentration on fitness peaks 
far from the starting point grow large before the concen- 
trations on peaks closer to the origin. This bizarre effect 
stems from the exponential growth of very small (expo- 
nentially decaying with the distance from the origin) but 
non-zero concentrations over all the fitness peaks shortly 
after the start. 

In this model we implicitly assume the mutation rates 
to evolve much slower than the fitness, i.e. there are no 
significant changes in the mutation rate during the evo- 
lution from one fitness peak to the next peak. 



The optimal copying fidelity q opt is defined by maxi- 
mizing the diffusion speed towards genotypes with supe- 
rior fitness. Mathematically this corresponds to minimiz- 
ing the time T it takes for the concentration x n on peak 
a n to reach its maximum, when the population starts 
at the proceeding peak <7 n _i. At the time when mu- 
tants from peak o n -\ have enabled the concentration 
x n to become large enough, i.e. x n > -k, exponential 
growth will start with initial concentration proportional 
to Pn(l — (?)"". Since the population at this time is lo- 
calized around peak n the concentration x n is described 
approximately by 



x n (t) 



q u <7 n t 



-it 



(4) 



where j = p n (l — q) a " . The denominator normalizes x n 
by summing the absolute growth in the surrounding of 
peak n, see Fig. ^[ The time T when x n (t) has a maxi- 
mum can be found by solving dx 2^ — 0, giving 
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FIG. 2. The picture shows X4 given by Eq. ^. The relevant 
parameter are the same as in Fig. |l|; v = 100, a t — i, p — 0.01 
and Q — 0.99. The maximum occurs at time T m 14 (de- 
fined as the time from the last peak's maximum). This is in 
agreement with the numerical solutions shown in Fig. [jj. 
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In (j 2 k) 



(5) 



where n = (T " +1 — — 



V = 



0~ n +l — O-71-l q 

The diffusion speed is defined as 



By making the approximation k ~ 1, we can 



write 



V = - 



0~n+l ^ C„_i 



In (7) 



(6) 
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FIG. 3. The figure shows V(q) given by Eq. ^. The maxi- 
mum gives the optimal copying fidelity q op t ■ Parameters used 
in the figure are v = 100, <Ji — i and p = 0.01. The shape of 
the cure is not sensitive to the parameter values, as long as 
i>> 1. 
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The optimal copying fidelity q op t is defined to maxi- 
mize the diffusion speed, and can therefore be derived 
by finding the maximum of V{q) in Eq. § (see Fig. |). 
Setting the derivative to zero, =^ — 0, and noting that 
q w 1 gives the equation: 



1 



1/(1 (ln(p) + ln(l-<z)) 



= 



(7) 



We are interested in the limit where the genome length 
is large. In this limit the first term in the denominator 
(involving p) can be neglected. Eq. [t] then reduces to 



v{l-q)\n{l-q) = -1 



(8) 



There is no closed analytic expressions for the solution to 
this equation, but a converging iterative expression can 
be found for the optimal copying fidelities 

1 

Qopt = 1 — 



vln (i/ln {v In (•••))) 



Q 



opt 



It is surprising that the optimal genomic copying fidelity 
depends so weakly on the genome length, and even more 
surprising that it is independent of all other parameters 
in the model. This independence is both interesting and 
important, especially since we start by assuming a spe- 
cific path for evolution. As it turns out the optimal muta- 
tion rate does not depend on the particular path chosen. 
The insensitivity of Qopt when the genome length varies 
can be seen by considering biologically plausible genome 
lengths, see Fig. ||. Note that the genomic copying fi- 
delity increases with genome length. 




FIG. 4. The figure shows the region where V(q) has a max- 
imum, calculated by numerical simulations of Eq. [jj. Parame- 
ter settings in the simulations were p — 0.01, a = 1, N = 10 s 
and v — 1000. The minimum occurs approximately at the 
point predicted by Eq. pi i.e., Q op t — 0.86. 
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FIG. 5. The figure demonstrates how weakly Q op t 
with genome length. 
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In simulations of a population consisting of 2000 indi- 
viduals with genome length v = 70 on a rugged fitness 
landscape (created by an elementary folding algorithm 
for calculating secondary structures of gene sequences), 
Fontana and Schuster find that the rate of evolution 
is maximal approximately at fi = 0.003. This is in close 
agreement with the mutation rate as predicted by Eq. ^ 
for genome length 70, ii opt = 0.0025. 

The optimal copying fidelity given in Eq. [| can also 
be derived using a more intuitive argument. The ar- 
gument also shows more clearly how evolved mutation 
rates on static fitness landscapes relate to evolved muta- 
tion rates in dynamic environments. The rate of growth 
between two peaks, with fitness difference Act, is given 
(9) by e^ A<Tt . The diffusion from an occupied peak to the 
next is proportional to (1 — q) a , where a measures the 
distance in sequence space between the peaks. The time, 
T, it takes for a population to evolve from one peak to 
an other will therefore be given by the solution of the 



a ln(l — q) 



Or- 



equation (1 - q ) a e Q Aat = 1, i.e. T 
ganisms, free to change their mutation rates, evolve a 
copying fidelity q op t that minimizes T(q). Deriving an 
expression for the equation ^ = 0, using q v rs 1, gives 
+ v ln(l — q) = 0, which is equivalent to Eq. || and is 

solved by Q op t ~ e~t^. 

In a recent paper |]lof , the evolution of mutation rates 
on a dynamic fitness landscape was studied. The fit- 
ness landscape consists of a single peak moving around 
in sequence space, shifting position on average once ev- 
ery r generations. The relative selective advantage for 
a sequence on the fitness peak is a. A shift of the peak 
consist of a changes of bases in the sequence defining the 
fitness peak. Since an individual in the population needs 
to produce offspring that are able to follow the shifts of 
the fitness peak, a non-zero mutation rate is selected for. 
It turns out that finding the optimal copying fidelity is 
equivalent to minimizing (1 — q) a e < ^ aT with respect to q. 
This is the same expression as for the growth rate be- 
tween fitness peaks on a static landscape. However, in 
the dynamic case the growth over a cycle, consisting of 
a shift and a static period, is be optimized rather than 
the time to evolve from one peak to the next. More gen- 



3 



erally, if the evolution of mutation rate is driven by a 
dynamic environment it will be selected to optimize the 
growth on the changing fitness landscape, whereas on a 
static landscape the mutation rate maximizing the rate 
of evolution towards higher fit genotypes will be selected 
for. Maximizing the growth on a dynamic landscape gives 
Qdyn — e~~ . 

There are some fundamental differences between the 
two models presented above. In the model based on 
dynamic fitness landscapes the population dynamics is 
driven by external changes of the environment. The or- 
ganisms have to passively wait for the environment to 
change and then adapt to the new fitness landscape. In 
the model based on rugged fitness landscapes the situa- 
tion is different. There always exist a higher fitness peak 
in the close neighborhood so the population has to min- 
imize the time for diffusing to and growing large on the 
higher peak. Hence the population should actively search 
the surroundings in sequence space. The main difference 
between the models is therefore the preexistent of higher 
fitness peaks close in sequence space, which results in 
very different optimal mutation rates. 

The genomic copying fidelity in both the static and 
dynamic case is approximately independent of genome 
length, a phenomenon that is also observed in nature. 
To be more precise, experiments show that the genomic 
copying fidelity is approximately constant within groups 
of similar organisms, e.g., simple DNA-based organisms 
have Q ks 0.996 whereas RNA based retroviruses have 
Q w 0.9 j|. Simple DNA based organisms for example 
have much too low mutation rates to be explained by 
evolution on the static landscapes studied in this paper. 
Retroviruses on the other hand show mutation rates that 
are in agreement with the predictions made in this pa- 
per. However, they may also be explained by mutation 
rates evolved as a response to a changing environment as 
discussed above. It is therefore unclear whether the ma- 
jor force behind the evolution of mutation rates for retro 
virus is maximizing the evolution rate towards higher fit- 
ness or maximizing the growth in a changing environment 
(caused be the immune system). 

In conclusion, we show that the optimal genomic copy- 
ing fidelity, i.e., that which optimizes the rate of evolu- 
tion, on a rugged fitness landscape can be written as 
i 

Qopt = e ln <"> , where v is the genome length. The op- 
timal genomic copying fidelity on rugged fitness land- 
scapes is predicted to be around 0.9 for realistic genome 
lengths (y £ [10 3 , 10 10 ]). Of the mutation rates observed 



in nature, retroviruses (including HIV) confirm this pre- 
diction. The model presented here therefore presents a 
possible explanation for the observed mutation rates for 
retro viruses. 
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